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Abstract — The relationship between crop production and 
amount of evapotranspiration is very important to 
agronomists, engineers, economists, and water resources 
planners. These relationships are often determined using 
classical least square regression (LSR). However, one 
needs high amount of samples to determine probability 
distribution function. Linear regression also requires so 
many measurements to obtain the valid estimates of crop 
production function coefficients. In addition, deriving ET- 
yield regression for each crop and each district is usually 
expensive, since lysimetric experiments should be repeated 
for several years for each crop. The object of this study is to 
introduce a fuzzy linear regression as an alternative 
approach to statistical regression analysis in determining 
coefficients of ET- yield relations for each crop and each 
district with minimum data. The application ofpossibilistic 
regression has been examined with a case study. Two data 
set for winter wheat in Loss Plateau of China and North 
China Plain have been used. The current finding shows 
capability of possibilistic regression in estimation of crop 
yield in data shortage conditions. 

Keywords — Data shortage; evapotranspiration; fuzzy 
regression; grain yield; production function. 


I. INTRODUCTION 

Water shortage is the major constraint to agricultural 
production. The relationships between crop yield and water 
use have been a major focus of agricultural research in the 
arid and semi-arid regions (Zhang and Oweis, 1999). Water 
management is very important in these regions. Many 
researchers have studied the effect of deficit irrigation on 
crop production as a solution (Zhang et al., 1999 and Kang 
et al., 2002). 

In agriculture water management, the adequate 
representation of production or crop yield functions is 
crucial for modeling purposes in environmental economic 
analyses. The discussion and estimation of different 
functional forms have therefore gained much attention in 
agronomic and agricultural economics literature (Finger and 
Hediger, 2007). Various functional forms have been 
considered so far, but less attention has been given to the 
estimation techniques. In general, crop yield is estimated by 
least square regression. Classical linear or non-linear 
regression assumes that the measurement errors are 
normally distributed and independent of each other. Since 
one needs so many samples to determine a probability 
distribution, linear or nonlinear regression require at least 8 
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to 30 measurements or observations to obtain valid estimate 
of parameters (Eslamian et al. 2012, Cheng Si and 
Bodhinayake, 2005). 

Measurement of some parameters such as 
evapotranspiration in yield function is expensive and time 
consuming. Therefore, it is difficult and sometimes 
impossible to obtain a simple yield function for regions with 
same climate. Moreover, evapotranspiration determination 
is subjected to different kind of uncertainties. These arise 
from measurement errors due to human and assumptions on 
deep percolation and uniformity of soil distribution. In these 
circumstances, classical regression may not give valid 
estimation for yield. In particular, confidence interval 
estimated with a few data points is very wide and may not 
provide suitable information that is usual for predictive 
purpose (Eslamian et al. 2001, Cheng Si and Bodhinayake, 
2005). 

Fuzzy sets theory can quantitatively deal with 
uncertainty in experimental data or ambiguity in human 
perception, and so it has been applied to various fields in 
which uncertainty and/or ambiguity have a serious 
influence. The theory does not need strict assumptions of 
probability functions as in the statistical methods, such as 
the normal distribution described above, and it can deal with 
the uncertainty more easily and more flexibly (Shimosaka et 
al., 1996). The objective of this study is to investigate 
whether fuzzy linear regression (Tanaka et al., 1982) would 
predict crop production and to provide a method for yield 
forecasting with less observation than least square 
regression. 

II. THEORY 
Water use-yield relationship: 

Crops consume water in the process of transpiration, and 
water evaporates from the soil. These processes are defined 
collectively as evapotranspiration (Thomyhwaite, 1948). 
The relationship between crop production and the amount of 
water applied to crop is important. This importance is 
currently considered due to declining in water resources and 
competition among users. 

Crop production models with resource and management 
inputs have been widely used, particularly by agricultural 
economist, and called production function (vaux 1983, 
Ostad-Ali-Askari et al. 2015). Hanks et al. (1969) reported 
that dry matter is linearly related to evapotranspiration for 
wheat, millet, oat and grain sorghum in both lysimetric and 
field plots. Cole and Mathews (1923) and Mathews and 
Brown (1938) investigated grain yield for winter wheat and 
sorghum They used linear regression techniques to evaluate 
the yield- evapotranspiration as follows: 
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Y = a ET + b ( 1 ) 

Where Y is grain yield (kg ha -1 ), ET is the growing 
season evapotranspiration (mm) and a (kg ha -1 mm 1 ) and b 
(kg ha -1 ), regression coefficients. 

ET is usually calculated using the soil water balance 
equation for growing season as given: 

ET=AW + I + P + Sg-D-R f (2) 

Where ET is actual evapotranspiration, AW the change in 
soil water storage between two soil moisture content 
measurements, I the irrigation, P the rainfall, S g the 
capillary rise from the lower soil layer to the crop root zone, 
D the deep percolation from the crop root zone, and Rf is the 
surface runoff (Kang et al. 2002). When the groundwater 
table is lower than 4 m below the ground surface, S g is 
usually negligible (Zhang et al., 1999). It is usually assumed 
that soil infiltration rate is larger than rainfall and irrigation 
density. 

Some studies had shown that the empirical relation 
between crop yield and seasonal evapotranspiration can take 
different forms and that the empirical coefficients in the 
relations vary with climate, crop type and variety, irrigation 
method, soil texture, fertilizer and tillage methods. These 
differences relate to regional variability in environment and 
agronomic practices, Information specific to a region is 
needed to define production function (Eslamian et al. 2015, 
Kang et al., 2002, Ostad-Ali-Askari et al. 2016). So, 
derivation of production functions for each region would be 
expensive and obtaining adequate data for linear regression 
would be difficult. 

Fuzzy linear regression method 

Fuzzy regression analysis was first proposed by Tanaka 
et al. (1982). Since membership functions of fuzzy sets are 
often described as possibility distributions, this approach is 
usually called possibilistic regression analysis (Tanaka et 
al., 1982). The basic concept of fuzzy theory of fuzzy 
regression is that the residuals between estimators and 
observations are not produced by measurement errors, but 
rather by the parameter uncertainty in the model, and the 
possibility distribution is used to deal with real observations 
(Tseng et al., 1999, Eslamian et al. 2016). This method 
provides the means by which the goodness of a relationship 
between two variables, y and x, may be evaluated on the 
basis of a small sample size. In this approach, the regression 
coefficients are assumed to be fuzzy number (Sahin and 
Hall, 1996, Ostad-Ali-Askari et al. 2017). 

The fuzzy linear regression (FLR) model can be expressed 
as: 
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Y =A 0 + A 1 x il +--- + A n x in =Ax i (3) 

Where xi = [x 0 ,x a ,...,x in ] is a vector of independent 
variables in the ith data z=l,...,m; A = [A 0 ,...,A n ]is a 

vector of fuzzy parameters exhibited in the form of 
symmetric triangular fuzzy numbers denoted 
by Aj = (pj , Cj ), j — 1 ,.. n , with its membership function 

depicted as (4) bellow where Pj is its central value and 

C j is its half width (See Figure 1). 

A fuzzy linear relationship can be represented by a band 
(the bold lines having membership=0) with a centre line 
(the dashed line having a membership=l) as in Figure 2. 

, \ a J~Pj\ 

1-—. PjP,+c,, (4) 

0, otherwise . 

Therefore, Eq. (3) can be written as: 

Y = (p 0 ,c 0 ) + (p l ,c l )x il +... + (p n ,c n )x in . ( 5 ) 


=■ 


Membership 

value 



Fig.l: Triangular representation of fuzzy numbers 



Fig.2: A Fuzzy linear relationship 


Since the regression coefficients are fuzzy numbers, the 

estimated dependent variable Y is a fuzzy number. 

Finally, the method uses the criterion of mini mizin g the 
total vagueness, S, defined as the sum of individual spreads 
of the fuzzy parameters of the model. 


Minimize S = mc (} + 

i =1 7=1 


( 6 ) 


The fuzzy coefficients are determined such that the 

estimated fuzzy output Y has the minimum fuzzy width c 

while satisfying a target degree of belief h. The term h can 
be viewed as a measure of goodness of fit or a measure of 
compatibility between the regression model and data. Each 

of the observed data sets, must fall within the estimated Y 
at h levels (Figure 3). The value of h is between 0 and 1 and 
h= 0 indicates that the assumed model is extremely 
compatible with the data, while h- 1 illustrated the assumed 
model is extremely incompatible with the data, h is chosen 
by the decision maker. A choice of the h-level value 
influences the widths c of the fuzzy parameters: 


p~{y t )>h ,i = 1,2,...,m. (7) 

Taheri et al. (2006) purposed a method of sensitivity 
analysis based on credible level h. Their results showed that 
as the credible level h , increases, the mean of predictive 
capability (MPC) increases, too. On the other hand, by 
increasing h, the total vagueness of model, S , increases as 
well. For selecting a suitable h we would analyze the 
variations of S and h. Variations of S is gradual from h 
equal zero up to optimal h , after optimal h , increasing of h 
makes an abrupt variation in S value. 

The problem of finding the fuzzy regression parameters 
was formulated by Tanaka et al. (1982) as a linear 
programming problem: 

m n 

Minimize S = mc {) + 

i =1 7=1 


Subject to: 


P o 




c () + 

7=1 


po+ Y p j x ‘j + (l _ h) 

7=1 


n 

c o+Yj c ‘ x ‘j 


7=1 


— y j 


( 8 ) 
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Eq. (8) is linear, thereby allowing the optimization problem 
to be solved by means of linear programming. 

V 



Po+T,P) x v 

j =i 

Fig.3: Triangular membership junction of fuzzy output 

III. APPROACH 

The evapotranspiration (ET)-wheat yield (Yield) data 
presented in Kang et al. (2002) and Zhang et al. (1999) was 
used in this study. 

One of our data bases is consist of experimental 
irrigation data, grain yield, seasonal ET, water use 
efficiency and climatic data summary during growing 
season winter wheat at four locations in the piedmont and 
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lowland of the North China Plain (Zhang et al., 1999). The 
locations are divided into two groups that represented 
different geographic characteristics in the regions based on 
the groundwater table and geography. Luacheng and 
Gaocheng are located in the piedmont of the Taihang 
Mountains, and Linxi and Nanpi are located in the lowland 
of the Haihe floodplain. The irrigation treatments are ranged 
from no irrigation (rain-fed: Io) to a maximum of seven 
irrigations (Ii, I 2 , I 3 , I 4 , I 5 , 16, and I 7 ) where subscript 
represents the number of irrigations during the crop¬ 
growing season in Gaocheng and Linxi, and to a maximum 
of five irrigations in Luancheng and Nanpi. The amount of 
water applied was about 45-75 mm each irrigation. Grain 
yield and seasonal evapotranspiration are listed in Table 1. 

Another data base (Kang et al., 2002) is consist of 
dataset form a lysimeter experiment that has been conducted 
for winter wheat (Triticum aestivum L.) during the period 
1995-1998 to evaluate the effects of limited irrigation on 
grain yield on the Loess Plateau of China. Kang et al. 
(2002) applied a controlled soil water deficit, either mild or 
severe, at different stages of crop growth. The average 
values of evapotranspiration and grain yield for different 
treatments in 1995-1998 are given in Table 2. 


Table. 1: Grain yield and seasonal evapotranspiration for four locations in North China (Zhang et al., 1999) 



Gaocheng 

linxi 

Luancheng 

Nanpi 

Irrigation 

treatment 

ET 

(mm) 

Yield 

(Kg/ha) 

ET 

(mm) 

Yield 

(Kg/ha) 

ET 

(mm) 

Yield 

(Kg/ha) 

ET 

(mm) 

Yield 

(Kg/ha) 

Io 

242 

2580 

247 

2610 

264 

3220 

281 

2800 

II 

305 

3600 

277 

3740 

356 

4770 

355 

3010 

12 

365 

4960 

358 

4670 

379 

5250 

420 

4060 

Is 

407 

5230 

414 

4990 

377 

5250 

418 

4940 

14 

437 

5280 

428 

5120 

439 

5100 

443 

4750 

Is 

437 

4240 

426 

4890 

453 

4790 

456 

5160 

16 

419 

4360 

478 

4940 





I? 

423 

4950 

489 

4440 






In current study, linear fuzzy regression (Tanaka et al., 
1982) are employed and Evapotranspiration- Meld fuzzy 
relationships for Luancheng, Napai (Zhang et al., 1999) and 
Loess Plateau of China (Kang et al., 2002) were obtained. 

For this purpose, complete dataset of Luancheng and 
Nanpi are applied. Zhang et al. (1999) has mixed Luacheng 


- Gaocheng datasets and presented a least square regression 
model for piedmont. In addition, the least square model for 
linxi - Nanpi was reported as lowland. In this study, fuzzy 
regression model is obtained for Luancheng and Nanpi and 
Gaocheng and Linxi datasets are used for validation of 
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fuzzy regression models which are derived from Luancheng 
and Nanpi datasets, respectively. 

Moreover, the dataset of eight different soil water 
content treatments (1, 3, 5, 7, 9, 11, 13, 15) in 1995-1996 
(Table. 2) is used to obtain ET-Yield fuzzy regression 
model in the Loess Plateau of China. Finally, for model 
validation, yield estimation of fuzzy model for water 
content treatments: 2, 4, 6, 10, 12 and 14 evaluated with 
observation data. 
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In these cases, (having only 5 or 8 observation), it is 
impossible to satisfy the basic assumption of statistical 
regression analysis (such as normality of error, 
independence of errors, and so on). So fuzzy regression can 
be used as an alternative approach. 

Value of total vagueness ( S ) calculated for h = 0-0.95 
with 0.05 intervals and acceptable value of h was 
determined. 


Tahle.2: Total evapotranspiration and grain yield in three growing seasons in the Loess Plateau of China (Kang et al., 2002). 



1995-1996 

1996-1997 

1997-1998 

Treatments 

ET 

Yield 

ET 

Yield 

ET 

Yield 

(mm) 

(Kg/ha) 

(mm) 

(Kg/ha) 

(mm) 

(Kg/ha) 

1 

267 

2493 

213 

1750 

220 

1612 

2 

308 

3520 

300 

3180 

277 

3060 

3 

304 

3089 

278 

3375 

231 

2039 

4 

310 

3533 

385 

3905 

232 

1771 

5 

301 

3060 

359 

3570 

310 

4079 

6 

339 

3506 

291 

3505 

235 

2040 

7 

356 

3441 

338 

3870 

296 

3060 

8 

370 

3659 

387 

4020 

285 

2788 

9 

362 

3672 

323 

4080 

254 

3076 

10 

305 

3680 

389 

4230 

285 

3852 

11 

292 

3294 

403 

4245 

227 

2045 

12 

399 

4233 

519 

4200 

358 

4060 

13 

354 

4325 

420 

4600 

330 

4749 

14 

367 

4485 

383 

4775 

340 

4811 

15 

370 

4553 

390 

4920 

329 

4792 


IV. RESULTS 

In applying fuzzy linear regression, grain yield(Kg/ha) is 
employed as the dependent variable and evapotranspiration, 
ET(mm) is assumed as independent variable. All the Yield 
and ET values are assumed to be crisp. The symmetric 
triangular form of the membership function is chosen for 
representing the regression parameters. According to Figure 
4, it is obvious that by taking large value for h , amount of S 
increase quickly. So, it seems that the values around 0.7 for 
h , are suitable values for h and this is in an agreement with 
Bardossy et al. (1990). According to Bardossy et al. (1990), 
the level of credibility is generally chosen so 

that 0.5 <h< 0.7 . 

The fuzzy model with symmetric triangular fuzzy 
coefficients for crop production modeling of winter wheat 


in three locations in China, as a function of growing season 
evapotranspiration, can be stated as follows: 

Y =(p 0 ,c 0 ) + (p 1 ,c l )ET 

Based on 6 data in Table 1, for Nanpi region, and 
adapting relation (8), the objective function is: 

Minimize S = 5c 0 + 2373q 

In addition, constrains (12 constrains) related to 
observations (6 observations) must be formulated, based on 
relation (8). For example, two constrains corresponding to 
the first observation, with h=0.7, are: 
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h 


X 10 4 Luancheng 



h 



h 

Fig A: The variation of the total vagueness (S), based on different amounts for h. 


p 0 + 281 p x -0.3(c 0 + 281q) < 2800 

Pq + 281 p x + 0.3 (c*q + 281 Cj) ^ 2800 

By minimizing the objective function S subject to 12 
constrains, with linear programming methods, the 
coefficients of the model are as follows: 

Aq = (-1589.34,0.00) = (14.29,4.44) 


Therefore, the possibility regression model for Nanpi region 
is: 

Y = (-1589.34,0.00) + (14.29,4.44) ET 

In addition, the coefficients of the possibilistic 
regression model were calculated for Luancheng and the 
Loess Plateau of China. The results are shown in Table 3. 

The results of fuzzy regression model for simulation 
data are shown in Figure 5. An estimation area at the high 
evapotranspiration is wider than low evapotranspiration 
(Figure 5). 


Table. 3: The possibilistic regression models for three sample area with h=0.7. 


Location 

Model 

Total 

vagueness (S) 

Nanpi 

Y = (-1589.34,0.00) + (14.29,4.44) ET 

10538 

Luancheng 

Y = (1026.98,0.00)+(9.75,4.82) ET 

10942 

Loess Plateau of 

China 

Y = (-351.00,0.00) + (11.95,4.35) ET 

11302 
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Nanpi Luancheng 




ET(mm) ET(mm) 


Loess Plateau of China 



ET(mm) 

Fig.5: Fuzzy regression relationships between winter wheat yields and FT in three locations in China. 


The variation of estimation area illustrates that 
uncertainly of simulation data, along the ET axis changes. 
From the simulation results, it can be understood that the 
estimation area can well express the degree of dispersion at 
each evapo transpiration more practically than the 
conventional regression method can, and therefore the area 
not only represents the relation between ET and grain yield 
but also has information on reliability, while the 
conventional crop production function represents only the 
relations between ET and yield. 

The uncertainty in field data is caused by variation in the 
climate of region (drought, wind and frost) and offense of 
insects and pests, etc. 


Interestingly, the half-width for the intercept is 
optimized to a value of zero during the minimization of the 
vagueness criterion in three locations (Nanpi, Luancheng 
and Loess Plateau of China), (Table. 3). Hence, the 
intercept of the fuzzy regression model is a crisp number 
and all of the fuzziness in the model arises from the slop 
being a fuzzy quantity. 

Figure 6 shows a representation of fitness of fuzzy 
regression. Validation of fuzzy regression models for 
estimation of coefficients of crop production functions in 
these regions is evaluated with test data. Figure 6 (a) shows 
position of ET-Yield data of linxi district in possibilistic 
regression model for Nanpi region. 
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(a) (b) 




ET(mm) 


ET(mm) 


(c) 



ET(mm) 

Fig.6: Representation of fitness of fuzzy model , using testing data. 


According to Zhang et al. (1999), linxi and Nanpi are 
located in the lowland of the Haihe floodplain and they 
represented same geographic characteristics in the region 
based on the groundwater table and geography. So, the 
estimated model for Nanpi should be applicable in linxi. 
Figure 6(a) shows that linxi data is in a good agreement 
with derived linear regression model for Napai. The derived 
Luancheng regression model is verified with Gaocheng data 
(Figure 6(b)). 

Also, the fuzzy regression model for Loess Plateau of 
China evaluated with 37 ET-Yield data in this region (Table 
2.). Figure 6(c) illustrates capability of fuzzy linear 
egression in estimation of production function despite of 
deficit data. 


parameter and climate, soil, water and crop alter the 
predicted yield. Evapotranspiration is the most important 
factor in yield estimation. Having crop production function 
in each district is necessary for estimation of yield 
condition, but, there should be many data estimation of crop 
production function with classical least square regression. 
As received from this study, fuzzy linear regression 
provides a convenient alternative to characterize crop yield 
in deficit data condition. The degree of believe is 
determined by Taheri et al. (2006) method. Validation of 
model is done by test data. However, this approach is 
suitable for crop yield predicting by few data. 


V. CONCLUSION 

A fuzzy linear regression is used to estimate coefficients 
of crop production function. For this purpose, 
evapotranspiration- yield measurements of winter wheat are 
used for three districts in China. Crop yield is a sensitive 
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